We're thrilled to share an exciting dataset about Airbnb rentals in New York City, USA, spanning from 2003 to 2022. This dataset is a treasure trove of insights into one of the most dynamic cities on the planet.
After thorough cleaning, we've narrowed it down to approximately 85,000 records across 19 columns. In this dataset, we'll explore:
Where are these Airbnb listings situated, and which neighborhoods are the hottest destinations?
How do prices vary depending on the type of accommodation and location?
What are the secrets behind a host's ratings and reviews?
Let's begin!
# Importing libraries for data analysis and visualization
# NumPy: Library for numerical computations and array manipulation
import numpy as np
# Pandas: Data manipulation and analysis library
import pandas as pd
# Seaborn: Data visualization library built on Matplotlib
import seaborn as sns
# Matplotlib.pyplot: Submodule of Matplotlib for creating plots
import matplotlib.pyplot as plt
# Plotly.graph_objs: Part of the Plotly library for interactive plots
import plotly.graph_objs as go
# Plotly.express: High-level interface for interactive visualizations using Plotly
import plotly.express as px
# os: Module for interacting with the operating system
import os
# WordCloud: Library for creating word clouds from text data
from wordcloud import WordCloud
# plotly.subplots.make_subplots: Function for creating subplots within a single Plotly figure
from plotly.subplots import make_subplots
# Jupyter Notebook magic command for displaying Matplotlib plots inline
%matplotlib inline
# Read data from the CSV file into the 'aib' DataFrame
aib = pd.read_csv("Airbnb_Open_Data.csv")
# Create a new DataFrame 'aib1' with a column 'house_rules' from 'aib'
aib1 = pd.DataFrame()
aib1["house_rules"] = aib["house_rules"]
/var/folders/sh/dbwfh3gx18n7xjl99467b4_m0000gn/T/ipykernel_3275/311372415.py:2: DtypeWarning: Columns (25) have mixed types. Specify dtype option on import or set low_memory=False.
#To display the first 10 rows of the aib DataFrame
aib.head(10)
| id | NAME | host id | host_identity_verified | host name | neighbourhood group | neighbourhood | lat | long | country | ... | service fee | minimum nights | number of reviews | last review | reviews per month | review rate number | calculated host listings count | availability 365 | house_rules | license | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1001254 | Clean & quiet apt home by the park | 80014485718 | unconfirmed | Madaline | Brooklyn | Kensington | 40.64749 | -73.97237 | United States | ... | $193 | 10.0 | 9.0 | 10/19/2021 | 0.21 | 4.0 | 6.0 | 286.0 | Clean up and treat the home the way you'd like... | NaN |
| 1 | 1002102 | Skylit Midtown Castle | 52335172823 | verified | Jenna | Manhattan | Midtown | 40.75362 | -73.98377 | United States | ... | $28 | 30.0 | 45.0 | 5/21/2022 | 0.38 | 4.0 | 2.0 | 228.0 | Pet friendly but please confirm with me if the... | NaN |
| 2 | 1002403 | THE VILLAGE OF HARLEM....NEW YORK ! | 78829239556 | NaN | Elise | Manhattan | Harlem | 40.80902 | -73.94190 | United States | ... | $124 | 3.0 | 0.0 | NaN | NaN | 5.0 | 1.0 | 352.0 | I encourage you to use my kitchen, cooking and... | NaN |
| 3 | 1002755 | NaN | 85098326012 | unconfirmed | Garry | Brooklyn | Clinton Hill | 40.68514 | -73.95976 | United States | ... | $74 | 30.0 | 270.0 | 7/5/2019 | 4.64 | 4.0 | 1.0 | 322.0 | NaN | NaN |
| 4 | 1003689 | Entire Apt: Spacious Studio/Loft by central park | 92037596077 | verified | Lyndon | Manhattan | East Harlem | 40.79851 | -73.94399 | United States | ... | $41 | 10.0 | 9.0 | 11/19/2018 | 0.10 | 3.0 | 1.0 | 289.0 | Please no smoking in the house, porch or on th... | NaN |
| 5 | 1004098 | Large Cozy 1 BR Apartment In Midtown East | 45498551794 | verified | Michelle | Manhattan | Murray Hill | 40.74767 | -73.97500 | United States | ... | $115 | 3.0 | 74.0 | 6/22/2019 | 0.59 | 3.0 | 1.0 | 374.0 | No smoking, please, and no drugs. | NaN |
| 6 | 1004650 | BlissArtsSpace! | 61300605564 | NaN | Alberta | Brooklyn | Bedford-Stuyvesant | 40.68688 | -73.95596 | United States | ... | $14 | 45.0 | 49.0 | 10/5/2017 | 0.40 | 5.0 | 1.0 | 224.0 | Please no shoes in the house so bring slippers... | NaN |
| 7 | 1005202 | BlissArtsSpace! | 90821839709 | unconfirmed | Emma | Brooklyn | Bedford-Stuyvesant | 40.68688 | -73.95596 | United States | ... | $212 | 45.0 | 49.0 | 10/5/2017 | 0.40 | 5.0 | 1.0 | 219.0 | House Guidelines for our BnB We are delighted ... | NaN |
| 8 | 1005754 | Large Furnished Room Near B'way | 79384379533 | verified | Evelyn | Manhattan | Hell's Kitchen | 40.76489 | -73.98493 | United States | ... | $204 | 2.0 | 430.0 | 6/24/2019 | 3.47 | 3.0 | 1.0 | 180.0 | - Please clean up after yourself when using th... | NaN |
| 9 | 1006307 | Cozy Clean Guest Room - Family Apt | 75527839483 | unconfirmed | Carl | Manhattan | Upper West Side | 40.80178 | -73.96723 | United States | ... | $58 | 2.0 | 118.0 | 7/21/2017 | 0.99 | 5.0 | 1.0 | 375.0 | NO SMOKING OR PETS ANYWHERE ON THE PROPERTY 1.... | NaN |
10 rows × 26 columns
#Total data in the dataset.
len(aib)
102599
#count the nan values
nan_counts = aib.isna().sum()
nan_counts
id 0 NAME 250 host id 0 host_identity_verified 289 host name 406 neighbourhood group 29 neighbourhood 16 lat 8 long 8 country 532 country code 131 instant_bookable 105 cancellation_policy 76 room type 0 Construction year 214 price 247 service fee 273 minimum nights 409 number of reviews 183 last review 15893 reviews per month 15879 review rate number 326 calculated host listings count 319 availability 365 448 house_rules 52131 license 102597 dtype: int64
#Dropping the useless columns
columns_to_drop = ['license', 'country', 'country code','last review', 'host id','house_rules', 'reviews per month']
aib.drop(columns=columns_to_drop, axis=1, inplace=True)
#We neeed to delete nan values from the dataset.
aib = aib.dropna()
#Checking data types of the data.
aib.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 99502 entries, 0 to 102598 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 99502 non-null int64 1 NAME 99502 non-null object 2 host_identity_verified 99502 non-null object 3 host name 99502 non-null object 4 neighbourhood group 99502 non-null object 5 neighbourhood 99502 non-null object 6 lat 99502 non-null float64 7 long 99502 non-null float64 8 instant_bookable 99502 non-null object 9 cancellation_policy 99502 non-null object 10 room type 99502 non-null object 11 Construction year 99502 non-null float64 12 price 99502 non-null object 13 service fee 99502 non-null object 14 minimum nights 99502 non-null float64 15 number of reviews 99502 non-null float64 16 review rate number 99502 non-null float64 17 calculated host listings count 99502 non-null float64 18 availability 365 99502 non-null float64 dtypes: float64(8), int64(1), object(10) memory usage: 15.2+ MB
# Convert the 'Construction year' column to string
aib['Construction year'] = aib['Construction year'].astype(str)
# Extract the numeric part and convert it to integers
aib['Construction year'] = aib['Construction year'].str.extract(r'(\d+)').astype(int)
#Changing the all values in the neighbourhood group column to the lower case.
aib['neighbourhood group'] = aib['neighbourhood group'].str.lower()
#changing the spelling mistakes
aib['neighbourhood group'] = aib['neighbourhood group'].str.replace('brooklyn', 'brookln')
# find and replace the dollar sign with empty string
aib['price'] = aib['price'].str.replace(r'\$', '', regex=True)
#replace the commas sign with empty string
aib['price'] = aib['price'].str.replace(r',', '', regex=True)
#changing the data type, object to int
aib['price'] = pd.to_numeric(aib['price'])
#Replacing the dollor sign, commas and spaces from the service fee column.
aib['service fee'] = aib['service fee'].str.replace(r'\$', '', regex=True)
aib['service fee'] = aib['service fee'].str.replace(r',', '', regex=True)
aib['service fee'] = pd.to_numeric(aib['service fee'])
#we are replacing values in availability 365, from <0 to 0.
negative_values_count = aib[aib['availability 365'] < 0]['availability 365'].value_counts().sum()
aib = aib[aib['availability 365'] <= 365]
# Convert 'minimum nights' column to integer data type
aib['minimum nights'] = aib['minimum nights'].astype(int)
# Convert 'number of reviews' column to integer data type
aib['number of reviews'] = aib['number of reviews'].astype(int)
# Convert 'review rate number' column to integer data type
aib['review rate number'] = aib['review rate number'].astype(int)
# Convert 'availability 365' column to integer data type
aib['availability 365'] = aib['availability 365'].astype(int)
# Convert 'calculated host listings count' column to integer data type
aib['calculated host listings count'] = aib['calculated host listings count'].astype(int)
# Count the number of listings ('id') in each 'room type' category
# Sort the listing counts in descending order to show the most common room types first
aib.groupby('room type')['id'].count().sort_values(ascending=False)
room type Entire home/apt 50668 Private room 43940 Shared room 2119 Hotel room 112 Name: id, dtype: int64
The most common room types are 'Entire home/apt' and 'Private room', which have a sum of 99,735 and take up 97.8% of the total. 'Shared room' and 'Hotel room' account for only 2.3%, with 'Hotel rooms' having just 115 rooms listed, or 0.1% of the total room type.
# Count the number of occurrences of each 'Construction year' and reset the index
year_counts = aib['Construction year'].value_counts().reset_index()
# Rename the columns to 'Construction year' and 'Count'
year_counts.columns = ['Construction year', 'Count']
# Sort the year counts by 'Construction year'
year_counts = year_counts.sort_values(by='Construction year')
# Create a line plot using Plotly Express (imported as 'px')
fig = px.line(
year_counts,
x='Construction year',
y='Count',
title='Construction year vs. building constructed per year', # Set the title of the plot
markers=True, # Show markers at data points
hover_name='Construction year', # Display 'Construction year' when hovering over data points
hover_data={'Count': True}, # Show the count in the hover tooltip
color_discrete_sequence=px.colors.qualitative.Set1, # Define a color scheme
)
# Increase the size of the markers on the line plot
fig.update_traces(marker=dict(size=10))
# Update the layout of the plot
fig.update_layout(
plot_bgcolor='white', # Set the background color of the plot
paper_bgcolor='white', # Set the background color of the entire plot area
title_font=dict(size=24, color='black'), # Customize the title font size and color
title_x=0.5, # Center the title
)
# Customize the hover tooltip format
fig.update_traces(
hovertemplate="Construction year: %{x}<br>Count: %{y:.0f} <extra></extra>"
)
# Customize the x-axis tick values and labels
fig.update_xaxes(
tickmode='array', # Use a fixed set of tick values
tickvals=[year for year in range(min(year_counts['Construction year']), max(year_counts['Construction year']) + 1, 2)], # Set the tick values
ticktext=[year for year in range(min(year_counts['Construction year']), max(year_counts['Construction year']) + 1, 2)] # Set the tick labels
)
# Display the plot
fig.show()
The line graph shows the average Airbnb prices in the United States from 2003 to 2022. In 2008, the average price was 639, which was the highest price during the 20-year span. In contrast, the lowest average price was 614 in 2019.
# Define custom colors for the bar chart
neighbor_palette = ['#FF5733', '#FFC300', '#DAF7A6']
# Calculate the average prices for each combination of 'neighbourhood group' and 'room type' and reset the index
avg_prices = aib.groupby(['neighbourhood group', 'room type'])['price'].mean().reset_index()
# Get unique room types
room_types = avg_prices['room type'].unique()
# Create a subplot with 2 rows and 2 columns, and set the figure size
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(9, 8))
# Iterate over each room type and create a bar chart for each
for i, room_type in enumerate(room_types):
ax = axes[i // 2, i % 2] # Select the current subplot
subset = avg_prices[avg_prices['room type'] == room_type] # Subset data for the current room type
# Create a bar chart with neighborhood group on the x-axis and average price on the y-axis
ax.bar(subset['neighbourhood group'], subset['price'], color=neighbor_palette)
# Set the title, x-axis label, y-axis label, and add a horizontal grid
ax.set_title(f'Room Type: {room_type}')
ax.set_xlabel('Neighborhood')
ax.set_ylabel('Avg Listings Prices')
ax.grid(axis='y', linestyle='--', alpha=0.7) # Add a horizontal grid with dashed lines
ax.set_axisbelow(True) # Ensure the grid is behind the bars
# Adjust the layout of the subplots for better spacing
plt.tight_layout()
# Set the main title for the entire figure
plt.suptitle('Avg Listings Prices by Neighborhood and Room Type', fontsize=16)
# Adjust the position of the main title to avoid overlap with subplots
plt.subplots_adjust(top=0.9)
# Display the plot
plt.show()
The average nightly Airbnb prices in New York City vary depending on the neighborhood group and the type of accommodation. In the 'Entire Home/Apt' category, Staten Island stands out with the highest average price at around 625, while other neighborhoods maintain an average price of approximately 610. For 'Shared Room' accommodations, Staten Island continues to lead with an average price of about 710. Conversely, in the 'Private Room' category, Staten Island offers the lowest average price, which hovers around 600. Finally, in the 'Hotel Room' category, with only three neighborhoods listed in the Airbnb dataset, Brooklyn takes the lead with the highest average price at approximately 720.
sorted_price = aib.sort_values('price', ascending=False)
top_10 = sorted_price.head(5)
top_10
| id | NAME | host_identity_verified | host name | neighbourhood group | neighbourhood | lat | long | instant_bookable | cancellation_policy | room type | Construction year | price | service fee | minimum nights | number of reviews | review rate number | calculated host listings count | availability 365 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5207 | 3877162 | Bushwick Room w/ Private Entrance & Bathroom! | unconfirmed | Julie | brookln | Bushwick | 40.70322 | -73.92913 | True | strict | Private room | 2020 | 1200 | 240 | 1 | 16 | 1 | 5 | 30 |
| 20343 | 12236775 | Lovely apartment in Williamsburg | verified | Harry | brookln | Greenpoint | 40.72253 | -73.94350 | True | flexible | Private room | 2020 | 1200 | 240 | 7 | 6 | 2 | 1 | 62 |
| 17080 | 10434620 | West 50th Street, Luxury Svcd Studio Apt | verified | Ken | manhattan | Hell's Kitchen | 40.76294 | -73.98574 | True | flexible | Entire home/apt | 2009 | 1200 | 240 | 30 | 1 | 4 | 87 | 329 |
| 75053 | 42453108 | Cozy room in bright, spacious apartment | unconfirmed | Steven | bronx | Hunts Point | 40.81731 | -73.89052 | False | moderate | Private room | 2003 | 1200 | 240 | 21 | 0 | 2 | 4 | 341 |
| 50535 | 28911817 | Stylish Petite Private Room in Brooklyn | verified | Shana | brookln | Bedford-Stuyvesant | 40.67842 | -73.91024 | False | moderate | Private room | 2020 | 1200 | 240 | 2 | 24 | 2 | 1 | 365 |
Create a horizontal bar chart to display the top 10 most expensive neighborhoods in the dataset. Create another chart with the 10 cheapest neighborhoods in the dataset. Create a box and whisker chart that showcases the price distribution of all listings split by room type.
# Create subplots with two rows and one column
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)
# Data for the most expensive neighborhoods chart
top_neighbourhoods_expensive = aib.groupby('neighbourhood')['price'].median().nlargest(10).reset_index()
top_neighbourhoods_expensive = top_neighbourhoods_expensive.sort_values(by='price', ascending=True)
# Data for the least expensive neighborhoods chart
top_neighbourhoods_cheap = aib.groupby('neighbourhood')['price'].median().nsmallest(10).reset_index()
top_neighbourhoods_cheap = top_neighbourhoods_cheap.sort_values(by='price', ascending=False)
# Create the first subplot (most expensive neighborhoods)
trace1 = go.Bar(
x=top_neighbourhoods_expensive['price'],
y=top_neighbourhoods_expensive['neighbourhood'],
orientation='h',
marker=dict(color='skyblue'),
name='Most Expensive'
)
# Create the second subplot (least expensive neighborhoods)
trace2 = go.Bar(
x=top_neighbourhoods_cheap['price'],
y=top_neighbourhoods_cheap['neighbourhood'],
orientation='h',
marker=dict(color='lightgreen'),
name='Least Expensive'
)
# Add the traces to the subplots
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=2, col=1)
# Update layout for both subplots
fig.update_layout(
title_text="Top Ten Neighborhoods Comparison",
title_x=0.5,
showlegend=True, # Show legends
legend=dict(x=1, y=0.5), # Adjust legend position
plot_bgcolor='white',
paper_bgcolor='white',
height=800,
)
# Update labels for both subplots
fig.update_xaxes(title_text="Price [$]", row=2, col=1)
fig.update_yaxes(title_text=". Neighbourhood\n\n\n", row=2, col=1)
# Show the combined subplot
fig.show()
The horizontal bar plot illustrates the average Airbnb price per night in different neighborhood groups across New York City. Notably, New Drop had the highest average Airbnb price per night, which was approximately 1,044, followed closely by Staten Island with the second-highest average price at 1,042. On the other hand, Lighthouse Hills had the lowest average Airbnb price per night at just 127. These substantial variations in nightly average prices within each neighborhood can be attributed to several factors, including the quality of Airbnb rentals, proximity to local attractions, and seasonal fluctuations.
# Create a scatter map using Plotly Express
fig = px.scatter_mapbox(aib, # DataFrame containing data
lat="lat", # Latitude column
lon="long", # Longitude column
opacity=0.3, # Set marker opacity
hover_name="neighbourhood group", # Show 'neighbourhood group' when hovering
hover_data=["neighbourhood group", "price"], # Additional data to display on hover
color="price", # Color markers based on 'price' column
color_continuous_scale='Viridis_r', # Choose color scale
title="Price comparing in the map", # Set the title of the plot
template="plotly", # Choose the plot template
zoom=10 # Set the initial zoom level
)
# Increase the size of the plot and customize other layout options
fig.update_layout(
mapbox_style="open-street-map", # Choose mapbox style
margin={"r": 10, "t": 50, "l": 10, "b": 10}, # Set plot margin
font=dict(size=17, family="Franklin Gothic"), # Customize font
height=600 # Set the height of the plot
)
# Add interactivity - zoom, pan, and reset buttons
fig.update_mapboxes(
zoom=10, # Set the initial zoom level
)
fig.update_geos(
projection_type="mercator", # Use mercator projection for better interactivity
showcoastlines=True, # Show coastlines on the map
)
# Further layout adjustments
fig.update_layout(
mapbox_style="open-street-map", # Choose mapbox style
margin={"r": 8, "t": 54, "l": 8, "b": 10}, # Set plot margin
font=dict(size=17, family="Franklin Gothic"), # Customize font
height=600, # Set the height of the plot
title="Price comparing in the map", # Set the title of the plot
title_x=0.5, # Set title's x position to the center
title_y=0.95 # Set title's y position to the top
)
# Display the plot
fig.show()
The Airbnb dataset price map visually highlights the pricing distribution of Airbnb rentals in New York City, with Manhattan had the lowest prices and Staten Island offering the highest. On the map, darker colors indicate higher prices, while lighter colors represent lower prices. Additionally, the map displays the latitude and longitude coordinates, neighborhood information, and pricing data for each individual Airbnb listing, providing a comprehensive overview of the rental landscape.
# Clean and preprocess the 'house_rules' column
aib1['house_rules'] = aib1['house_rules'].fillna('') # Replace NaN values with empty strings
house_rules_text = " ".join(aib1['house_rules'].astype(str))
# Create a WordCloud object with a larger size
wordcloud = WordCloud(width=1000, height=600, background_color='white').generate(house_rules_text)
# Create a centered figure
plt.figure(figsize=(12, 7))
# Calculate center position
x_centered = (plt.gca().get_xlim()[1] - plt.gca().get_xlim()[0]) / 2.0
y_centered = (plt.gca().get_ylim()[1] - plt.gca().get_ylim()[0]) / 2.0
# Display the Word Cloud in the center
plt.imshow(wordcloud, interpolation='bilinear', extent=[x_centered - 500, x_centered + 500, y_centered - 300, y_centered + 300])
plt.axis('off')
plt.title('House Rules')
plt.show()
Airbnb listings typically focus on providing guests with a comfortable and convenient place to stay, with specific amenities that are important to guests. The most common words in Airbnb listings describe the type of property, the smoking room,checking time, house, pet and the amenities. Airbnb hosts can use the word map to identify the most important words and phrases to include in their listings.
review_rate_counts = aib['review rate number'].value_counts()
# Define the slice to explode (1 rating in this case)
explode_slice = "1" # You may need to convert it to a string if it's not already
# Create a custom color palette for the slices
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# Create a list to specify the pull amount for each slice
explode_values = [0.1 if label == explode_slice else 0 for label in review_rate_counts.index]
# Create a pie chart using Plotly with a white border and exploded slice
fig = go.Figure(data=[go.Pie(
labels=review_rate_counts.index,
values=review_rate_counts,
marker=dict(
line=dict(color='white', width=2), # Set white border
colors=colors # Apply the custom color palette
),
hole=0.4, # Adjust the size of the center hole if desired
domain={"x": [0, 0.5]}, # Adjust the domain to explode the slice
)])
# Customize the layout
fig.update_layout(
title_text="Distribution of Review Rates",
title_x=0.17, # Center the title horizontally
legend=dict(x=0.5, y=0.3), # Adjust the legend position (x and y coordinates)
)
# Show the pie chart
fig.show()
A doughnut chart visualizes the distribution of customer review ratings, revealing that ratings from 2 to 5 are fairly consistent, each accounting for approximately 22.7% of the total. In contrast, reviews with a rating of 1 constitute a smaller percentage, at only 8.69%. While the majority of customers provide positive ratings, there is room for improvement, as a noteworthy portion of clients is leaving lower ratings.
review_rate_per_neighbourhood_group = aib.groupby('neighbourhood group')['review rate number'].mean()
availability_per_neighbourhood_group = aib.groupby('neighbourhood group')['availability 365'].mean()
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
# Assuming you have DataFrames named 'review_rate_per_neighbourhood_group' and 'availability_per_neighbourhood_group'
# Replace these with your actual data
# Both DataFrames should have the same index (neighborhood group names)
# Create subplots with 1 row and 2 columns
fig = make_subplots(rows=1, cols=2, subplot_titles=("Average Review Rate", "Average Availability"))
# Add the first bar plot for average review rate
trace1 = go.Bar(
x=review_rate_per_neighbourhood_group.index,
y=review_rate_per_neighbourhood_group.values,
text=[str(round(i, 2)) for i in review_rate_per_neighbourhood_group.values],
marker=dict(color=px.colors.sequential.algae),
name="Review Rate"
)
# Add the second bar plot for average availability
trace2 = go.Bar(
x=availability_per_neighbourhood_group.index,
y=availability_per_neighbourhood_group.values,
text=[str(round(i)) for i in availability_per_neighbourhood_group.values],
marker=dict(color=px.colors.sequential.deep),
name="Availability"
)
# Add the traces to the subplots
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)
# Update the layout for the entire figure
fig.update_layout(
title_text="Average Review Rate and Availability per Neighbourhood Group",
font=dict(size=20, color='white', family='Avenir'),
template='plotly_dark',
showlegend=False # Hide the legend as it's not needed in this case
)
# Show the merged plot
fig.show()
The bar plot displays the average review rate and availability per neighborhood group in New York City. We can observe that Staten Island had the highest average review rate and availability in New York City, with a score of 3.39 out of 5 and 195 days of availability out of 365. On the left side, we can see that Brooklyn had the lowest average review rate at 3.27 and also the lowest availability, with only 122 out of 365 days. Queens and the Bronx, on the other hand, had similar review rates, averaging around 3.33 and 3.34, respectively. On the right side, we can see that Queens and the Bronx had similar availability, both around 158 and 177 days.
#I will categorize the price
aib["price"].median()
625.0
# Define a function 'cato_price' that categorizes prices
def cato_price(p):
if p > 600:
return "More than 600"
else:
return "less than 600"
# Apply the 'cato_price' function to each value in the 'price' column
aib["cat_price"] = aib["price"].apply(cato_price)
review_rate_per_neighbourhood_group = aib.groupby('neighbourhood group')['review rate number'].mean()
availability_per_neighbourhood_group = aib.groupby('neighbourhood group')['availability 365'].mean()
# Define the columns to be used for the parallel categories plot
labels = {"host_identity_verified": "Host Identity Verified",
"neighbourhood group": "Neighbourhood Group",
"room type": "Room Type",
"cat_price": "Price"}
# Create a parallel categories plot using Plotly Express
fig = px.parallel_categories(
aib,
dimensions=["host_identity_verified", "neighbourhood group", "room type", "cat_price"], # Specify the columns to be used
labels=labels # Rename column labels for the plot
)
# Increase the figure size
fig.update_layout(
width=1000, # Set the width of the plot
height=400 # Set the height of the plot
)
# Display the plot
fig.show()
In a parallel categories (or parallel sets) plot, each row of the data frame is grouped with other rows that share the same values of dimensions and then plotted as a polyline mark through a set of parallel axes, one for each of the dimensions. There are four axes in the graph: "Host Identity Verified," "Neighbourhood Group," "Room Type," and "Price." Each axis has two to six categorical values. When examining a specific combination on the graph, it displays the count of values, which means it shows the number of available room types in a particular neighborhood. Additionally, it indicates whether the rooms are verified or unverified by host identity and also provides price information and we can also see the counts of different categories like how many airbnb in brokln neighbourhood or how many private room was available.
Link to the data source :- https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata
In our exploration of the Airbnb dataset for New York City, we've uncovered some intriguing insights:
Popular Destination: Manhattan emerges as the hottest Airbnb destination, boasting the highest number of listings. With a balanced mix of room types, reasonable prices, and favorable ratings, it's a prime choice for travelers.
Price Variations: Prices vary by location and accommodation type and avaibilities. Manhattan leads in terms of apartment and staten island shared room prices, while the Bronx offers the most affordable private rooms. For hotel rooms, Brooklyn takes the lead.
Host Rating Secrets: The overall average rating hovers around 3.34, with Staten Island standing out at 3.39. It appears that ratings are influenced by price, neighbourhood (with best views) and house rules such as (smoking and pet policies) indicating the importance of clear guidelines for guests.
Our journey through this dataset has shed light on the dynamics of Airbnb in New York City, revealing valuable insights for both travelers and hosts.
Thank you for joining us on this data-driven adventure!